Combining SFCscore with Random Forests leads to improved affinity prediction for protein-ligand complexes
نویسندگان
چکیده
SFCscore is a collection of emprirical scoring functions derived from a set of over 60 descriptors for protein-ligand complexes of known structure [1]. By the time of their derivation, SFCscore functions were the best-performing scoring functions tested on large heterogeneous data sets, but the overall correlation was still not within the desired range. Similarly, despite the ever increasing amount of structure and affinity data, the general advancements in the development of empirical scoring functions have been rather moderate over the past years. However, more recently, Ballester and Mitchell [2] published a function that outperformed current state-of-the-art scoring functions when tested against the PDBbind benchmark set [3]. This function uses relatively simple atom contact counts as descriptors and is derived by the Random Forest algorithm. Here, we present a study in which we used Random Forests to derive a new function ("SFCscoreRF”) based on the SFCscore descriptors as input data. Although this is not a fully non-parametric approach, the descriptors are supposed to capture more accurately the physically relevant interactions. We tested the new function against the PDBbind benchmark set and the CSAR-NRC HiQ 2010 set [4] and, in addition, performed the Leave-Cluster-Out validation as proposed by Kramer and Gedeck for the PDBbind set [5]. The results suggest that the new function significantly improves the predictive power of SFCscore, as it increases the correlation between predicted and experimentally determined affinities for the PDBbind benchmark set from r = 0.41 (best previous SFCscore function) to r = 0.61 (SFCscoreRF) and for the CSAR data set from r = 0.38 to r = 0.53. Published: 22 March 2013
منابع مشابه
Binding Affinity Prediction with Property-Encoded Shape Distribution Signatures
We report the use of the molecular signatures known as "property-encoded shape distributions" (PESD) together with standard support vector machine (SVM) techniques to produce validated models that can predict the binding affinity of a large number of protein ligand complexes. This "PESD-SVM" method uses PESD signatures that encode molecular shapes and property distributions on protein and ligan...
متن کاملPPCM: Combing Multiple Classifiers to Improve Protein-Protein Interaction Prediction
Determining protein-protein interaction (PPI) in biological systems is of considerable importance, and prediction of PPI has become a popular research area. Although different classifiers have been developed for PPI prediction, no single classifier seems to be able to predict PPI with high confidence. We postulated that by combining individual classifiers the accuracy of PPI prediction could be...
متن کامل#56 - Binding affinity prediction using a nonparametric regression model based on physicochemical and structural descriptors of the nano-environment for protein-ligand interactions
We propose a new empirical scoring function for binding affinity prediction modeled based on physicochemical and structural descriptors that characterize the nano-environment that encompass both ligand and binding pocket residues. Our hypothesis is that a more detailed characterization of protein-ligand complexes in terms of describing nano-environment as precisely as possible can lead to impro...
متن کاملDevelopment of target-biased scoring functions for protein-ligand docking
Accurate scoring of protein-ligand interactions for docking, binding-affinity prediction and virtual screening campaigns is still challenging. Despite great efforts, the performance of existing scoring functions strongly depends on the target structure under investigation. Recent developments in the direction of target-classspecific scoring methods and machine-learning-based procedures reveal s...
متن کاملBinding Affinity Prediction for Protein-Ligand Complexes Based on β Contacts and B Factor
Accurate determination of protein-ligand binding affinity is a fundamental problem in biochemistry useful for many applications including drug design and protein-ligand docking. A number of scoring functions have been proposed for the prediction of protein-ligand binding affinity. However, accurate prediction is still a challenging problem because poor performance is often seen in the evaluatio...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 5 شماره
صفحات -
تاریخ انتشار 2013